The Evolution of the Hedge Fund

Guest Lecture for MIT 18.5096
Topics in Mathematics with Applications in Finance

Jonathan Larkin

October 2, 2025

Disclaimer

This presentation is for informational purposes only and reflects my personal views and interests. It does not constitute investment advice and is not representative of any current or former employer. The information presented is based on publicly available sources. References to specific firms are for illustrative purposes only and do not imply endorsement.

About Me

Managing Director at Columbia Investment Management Co., LLC, generalist allocator, Data Science and Research lead. Formerly CIO at Quantopian, Global Head of Equities and Millennium Management LLC, and Co-Head of Equity Derivatives Trading at JPMorgan.

This presentation is available at github.com/marketneutral/hedge_fund_evolution.

What Evolution?

Two trends

  • Unbundling
  • Human + Machine Collaboration

Theory

Condorcet Jury Theorem (1785)

  • The Condorcet Jury Theorem states that if each member of a jury has a probability greater than 1/2 of making the correct decision, then as the number of jurors increases, the probability that the majority decision is correct approaches 1.

\[ P(\text{majority correct}) \to 1 \text{ as } n \to \infty \\ \iff \text{independence of errors} \]

  • e.g., sklearn.ensemble.VotingClassifier relies on this result.

Boosting Weak Learners (1988)

  • Kearns, Michael. Thoughts on Hypothesis Boosting. 1988.
  • Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. 2001.
  • Sequentially train many “weak learner” models, each focusing on the errors of the previous ones.
  • Gradient boosted decision trees are the dominant approach in tabular machine learning still today.
  • e.g., sklearn.ensemble.HistGradientBoostingClassifier, xgboost, lightgbm, catboost

Boosting in a Nutshell

  • \(F_M\) is the ensemble model. After M rounds: \[ F_M(x) = F_0(x) + \sum_{m=1}^M \gamma\, h_m(x) \]
  • Each round fits \(h_m\) to the negative gradient of the loss at \(F_{m-1}\), then updates: \[ F_m(x) = F_{m-1}(x) + \gamma\, h_m(x) \]
  • \(\gamma\) is the learning rate; \(h_m\) is a weak learner (e.g., shallow tree).

Model Stacking (1992)

  • Wolpert, David H. Stacked Generalization. 1992.
  • Train “meta-model” on the predictions of independent base models.
  • Works best when base models are diverse and capture different aspects of the data.
  • e.g., sklearn.ensemble.StackingClassifier

Stacking in a Nutshell

  • Combine several different models by training a meta-model on their predictions.
    • Train M independent base models \((f_1, \dots, f_M)\) (e.g., linear model, tree, neural net, etc.).
    • Using an appropriate cross validation scheme, collect out-of-fold predictions for each training example to avoid leakage.
    • Train a meta-model \((g)\) on these predictions (optionally with the original features). \[ \hat{y}(x) = g\!\big(f_1(x),\, f_2(x),\, \dots,\, f_M(x)\big) \]

Ensemble Methods Summary

  • Voting: combine models, majority vote.
  • Boosting: sequentially build models, each correcting the previous.
  • Stacking: combine diverse models, leveraging their strengths.
    • Model Averaging is a special case of stacking: the meta-model is a weighted linear sum.

Stacking into Boosting

  • Why not both?
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

lin = Pipeline([
  ("scaler", StandardScaler()),
  ("lr", LogisticRegression(max_iter=1000))
])

stack = StackingClassifier(estimators=[("lin", lin)],
  final_estimator=LGBMClassifier(),
  stack_method="predict_proba", passthrough=True, cv=cv
)

stack.fit(X_train, y_train)
y_pred = stack.predict(X_test)

The Dunbar Number (1992)

  • Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution, 22(6), 469–493.
  • Max human maintainable stable relationships ≈ 150
  • Limit of trust & cohesion
  • Beyond limit → silos, slow decisions, culture strain

Dunbar cont’d: How Hedge Funds Manage It

  • 👉 Scale by respecting Dunbar
    • Pods → small teams, central risk
    • Quant → structure as an assembly line
    • Lean → keep a cap on size, preserve culture
    • Bureaucracy → heavy process to scale

Wisdom of Crowds (2004)

  • Surowiecki, James. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. Doubleday, 2004.
  • For the crowd to be smarter than experts, we require
    • Diversity of opinion → different perspectives reduce blind spots
    • Independence of members → avoid groupthink
    • Decentralization → empower local knowledge
    • Aggregation of information → combine insights effectively

The Common Task Framework (2007-)

  • Donoho, D. (2017). “50 Years of Data Science.” Journal of Computational and Graphical Statistics, 26(4), 745–766.
    • Define a clear task (e.g., image recognition).
    • Provide dataset + ground truth labels + hidden test set.
    • Set evaluation metric (accuracy, F1, etc.).
    • Run open competition among researchers.
  • Netflix Prize (2006), Kaggle (2010), ImageNet (2012)…

Common Task Framework (cont’d)

  • “The Kaggle Grandmasters Playbook: 7 Battle-Tested Modeling Techniques for Tabular Data”, September 18, 2025, Nvidia Blog, link.

Machine, Platform, Crowd (2017)

  • Bryan McAfee and Erik Brynjolfsson. Machine, Platform, Crowd: Harnessing Our Digital Future. W. W. Norton & Company, 2017.
    • Wisdom of crowd means groups > individual experts
    • Platforms unlock assets (Uber, Airbnb)
    • Innovation from open-source & collaboration
    • Trust via ratings (leaderboards)
    • Success is \(f(\text{incentives}, \text{governance})\)

Theory Takeaways

  • Successes in machine learning demonstrate the critical importance of ensemble methods.
  • The Common Task Framework has driven scientific progress at scale.
  • Social science principles can inform on the design of incentives and processes to harness collective intelligence.

The Traditional Hedge Fund

Quant Equity Workflow

  • Larkin, Jonathan R., “A Professional Quant Equity Workflow”, Quantopian Blog, 2016, link.
  • Separate teams are focused along an assembly line
    • Data acquisition
    • Alpha research (aka feature engineering)
    • Signal combination (aka modeling)
    • Risk and transaction cost modeling
    • Portfolio construction (aka optimization)
    • Execution

Quant Equity Workflow

  • Hope, Bradley. “With 125 Ph.D.s in 15 Countries, a Quant ‘Alpha Factory’ Hunts for Investing Edge.” Wall Street Journal, April 5, 2017. link

Quant Equity Workflow

flowchart LR

    DATA(Data) --> UDEF(Universe Definition)

    UDEF --> A1(alpha 1)
    UDEF --> A2(alpha 2)
    UDEF --> ADOTS(alpha...)
    UDEF --> AN(alpha N)

    A1 --> ACOMBO(Alpha Combination)
    A2 --> ACOMBO
    ADOTS --> ACOMBO
    AN --> ACOMBO

    DATA --> TARGET(Target)
    TARGET --> ACOMBO
    TARGET --> PCON
    DATA --> RISK(Risk & T-Cost Models)

    ACOMBO --> PCON(Optimization)
    RISK --> PCON

    PROD{{t-1 Portfolio}} --> PCON
    PCON --> IDEAL{{Ideal Portfolio}}
    IDEAL --> EXEC
    
    EXEC(Execution)

Workflow: Minimal Non-Trivial Implementation

  • Craft four simple alphas (momentum, reversal, quality, value)
  • Create a target (forward 5d return demeaned)
  • Combine alphas with linear model
  • Use cvxportfolio machinery for risk model, t-cost model, optimization
  • Cvxportfolio repo on github
  • Boyd, Stephen, et al. “Multi‑Period Trading via Convex Optimization.” Foundations and Trends in Optimization, vol. 3, no. 1, 2017, pp. 1–76.

Unbundling

Introduction

  • Traditional hedge funds: small teams, opaque processes, high fees
  • New paradigm: decentralized hedge funds
    • Crowdsourced intelligence
    • Blockchain-based incentives
    • Global participation
  • Case studies: Numerai, Yiedl, CrowdCent

Numerai – Overview

  • San Francisco–based hedge fund (founded 2015)
  • Crowdsources stock market predictions from global data scientists
  • Uses a meta-model: ensemble of community models aggregated into live trading strategy
  • Mission: “solve the hardest problem in finance” with diverse, anonymized datasets

Numerai – Community & Competitions

  • Weekly data science tournaments
  • Participants submit ML predictions on anonymized market data
  • Scored on correlation with actual returns
  • Staking with NMR tokens:
    • Stake NMR = confidence in predictions
    • Rewards for accuracy, penalties for errors

Numerai – Data & Roles

  • Company: curates data, manages fund, constructs stake-weighted meta-model
  • Community: builds models, submits predictions, stakes tokens
  • Data: anonymized, obfuscated stock/market datasets
  • Company executes trades; community generates intelligence

Numerai – Role of Crypto

  • Numeraire (NMR): first hedge-fund-issued cryptocurrency
  • Staking aligns incentives: accurate models gain rewards, inaccurate lose stake
  • All payouts and stakes occur via Ethereum smart contracts
  • Supports trustless, global participation

Yiedl – Overview

  • Founded 2023, fully DAO-based hedge fund
  • Mission: replace fund managers with blockchain data science tournaments
  • Operates via on-chain vaults on Optimism + Synthetix
  • Managed by token holders, not a centralized GP

Yiedl – Community & Competitions

  • Weekly crypto-asset prediction tournaments
  • Participants forecast asset returns, stake Yiedl tokens
  • Predictions stored/encrypted on IPFS & blockchain
  • Smart contracts auto-reward/penalize based on accuracy
  • Two tracks:
    • Neutral vault: rank assets for market-neutral strategy
    • UpDown vault: propose portfolio weights

Yiedl – Data & Roles

  • Data: curated decade-long crypto datasets (prices, on-chain metrics, sentiment)
  • Community: builds predictive models, stakes tokens, governs DAO
  • Company/DAO: develops infrastructure, aggregates predictions, executes trades via smart contracts
  • Division: community = what to trade, DAO = how to implement

Yiedl – Role of Crypto

  • YIEDL token:
    • Staking in competitions
    • Governance (DAO voting)
    • Rewards for accurate predictions
  • All trades executed via DeFi protocols (Synthetix)
  • On-chain fund = transparent, auditable, permissionless

CrowdCent – Overview

  • Hybrid crowdsourced investment platform (2022–2023)
  • Combines fundamental analysts + data scientists
  • Mission: democratize asset management with human + machine intelligence
  • “Next-generation of investing: decentralized, systematized, democratized”

CrowdCent – Community & Competitions

  • Analysts: share investment theses (e.g. via SumZero)
  • Data Scientists: join open ML challenges to evaluate ideas
  • Examples:
    • Hyperliquid Challenge: rank crypto assets by 10–30d returns
    • Equity NLP Challenge: analyze analyst reports for predictive alpha
  • Leaderboards, percentile scoring, meta-model aggregation

CrowdCent – Data & Roles

  • Data:
    • Fundamental research from SumZero (analyst reports, ratings)
    • Market + crypto data, engineered features
  • Community:
    • Analysts generate qualitative ideas
    • Data scientists build quantitative models
  • Company: runs competitions, integrates data, manages portfolio construction & execution

CrowdCent – Role of Crypto

  • No native token (as of 2025)
  • Crypto as an asset class: community builds crypto strategies (Hyperliquid challenge)
  • Integration with crypto projects:
    • Fund that stakes in Numerai’s NMR ecosystem
    • Bridges decentralized funds together
  • Less “on-chain” than Yiedl, but embraces crypto markets and ethos

Conclusion

  • Numerai: pioneered crypto-incentivized crowdsourced hedge fund
  • Yiedl: DAO-based, on-chain hedge fund built entirely with DeFi
  • CrowdCent: blends human fundamental research with ML and crypto strategies
  • Common threads:
    • Community-driven intelligence
    • Machine learning aggregation
    • Cryptocurrency as incentive + infrastructure
  • Future hedge funds: open, decentralized, global

Human + Machine

Types of Collaboration

  • Horizontal
  • Vertical

Horizontal

  • Human forecasts concatenated with machine forecasts
  • Fit model on both

Vertical

  • Stepwise: human first, machine second (or vice versa)

Motivation

  • Canonical view: “Public” info is free and instantly priced (EMH).
  • Reality: Converting public data → stock picks is costly (analysts, data prep, modeling, infra, time).
  • Research question: How large are the economic costs of processing public information?

Study Design (High Level)

  • Population: 3,337 active, diversified U.S. equity mutual funds (1990–2020).
  • Build an AI analyst that uses only public data and realistic constraints.
  • Compare:
    • Human manager (actual holdings)
    • AI-modified (hybrid; selective replacements)
    • AI-only (full replacement within style constraints)
  • Measure: Dollar alpha and Sharpe; treat forgone gains as lower bound of managers’ marginal info-processing costs.

Key Findings (Punchline)

  • AI-modified: +$17.1M incremental per quarter vs. human; ~93% of managers outperformed over their lifetimes.
  • AI-only: +$17.2M incremental per quarter; ~42% allocated to style indices without hurting results.
  • Sharpe improves materially (details ahead).
  • Results robust to risk models, transaction costs, benchmark identification, and incentive heterogeneity.

Conceptual Framework

  • Investors process info until marginal cost = marginal gain.
  • If AI (public info only) can improve a manager’s portfolio under the manager’s constraints, the forgone gains represent a lower-bound on the manager’s marginal info costs.
  • Human + machine is evaluated within fund style/risk/size/liquidity constraints.

Data Inputs (Public Only)

  • Market (CRSP): prices, returns (incl. delisting), volatility, beta, liquidity/volume, market cap.
  • Accounting (Compustat point-in-time; proper lags): profitability, accruals, leverage, growth, etc.
  • Analysts (I/B/E/S): consensus EPS, recommendations, price targets (careful timing; summary files).
  • Text (EDGAR): 10-K/Q and 8-K tone/complexity/uncertainty (point-in-time release).
  • Macro (Welch–Goyal set): term/default spreads, D/P, E/P, etc.
  • Ratings: S&P long-term issuer ratings.
  • ~170 features; winsorization by feature type; point-in-time construction to avoid look-ahead.

Return Target & Benchmarking

  • Predict quarter-ahead stock returns:
    • Preferred: DGTW benchmark-adjusted returns (size × B/M × momentum groups).
    • Also considered: excess returns vs. risk-free.
  • Portfolio construction and evaluation are within DGTW groups to preserve style.

ML Pipeline (Pseudocode)

# =========================
# FEATURES & LABELS
# =========================
for each month t in 1980..2020:
    X_t := features available by end of t-1 (market, accounting (lagged), IBES, EDGAR text, macro, ratings)
    y_t := realized stock return over t+1..t+3 (DGTW-adjusted preferred)

# Preprocess within training folds only:
# - Winsorize features (type-specific rules)
# - Impute missing (numeric: quarter-mean; categorical/flags: 0)
# - Standardize numeric using train stats

# =========================
# ROLLING EXPANDING TRAINING
# =========================
for prediction year Y in 1986..2020:
    train_period := 1980-01 .. (Y-1)-10  # end in Oct Y-1 to avoid quarter overlap
    # time-series cross-validated randomized hyperparameter search (RF depth, trees, min split/leaf, mtry)
    fit RandomForest_Y on train_period to minimize validation MSE

    # =========================
    # PREDICT MONTHLY IN YEAR Y
    # =========================
    for each month t in Y:
        ŷ_t := RandomForest_Y.predict(X_t)
        # also compute within-DGTW ranks/deciles for portfolio rules

# Optionally also fit a Neural Network model; ensemble via average rank to get a small lift.

What Drives Predictions?

Permutation importance analysis shows simple features are highly influential:

  • Market value, dollar volume, trading activity, earnings-forecast signals.
  • RF captures nonlinear interactions among simple predictors.

Portfolio Construction – Shared Constraints

  • No shorting; quarterly rebalance only.
  • Within-style swaps: replacements must come from the same DGTW group.
  • Depth/liquidity: cap any single holding to ≤ 20% of the stock’s market cap (clip and keep overflow).
  • No duplicate replacements (“without replacement” within a fund/quarter).
  • Payout convention: AI’s incremental gain is paid out quarterly so human and AI start next quarter with equal AUM (conservative for AI in dollars).

AI-Modified (Hybrid) – Pseudocode

inputs:
  w_h[j]      # human start-of-quarter weight for stock j
  g[j]        # DGTW group of stock j
  decile[j]   # predicted decile within g[j] (1=worst ... 10=best)
  ŷ[j]        # predicted DGTW-adjusted return
  NAV         # fund net asset value at start of quarter

initialize:
  w_ai := w_h
  used := ∅    # prevent duplicate use of replacement names

# Keep strong human picks
for j in holdings sorted by descending w_h[j]:
    if decile[j] == 10:
        used.add(j)

# Attempt upgrades for others (largest positions first)
for j in holdings sorted by descending w_h[j]:
    if decile[j] in 1..9:
        group := g[j]
        C := { s in group | decile[s] == 10 and s ∉ used }
        if C ≠ ∅:
            # choose best candidate
            k := argmax_s∈C ŷ[s]
            target_value := w_h[j] * NAV
            max_value := 0.20 * market_cap(k)
            delta_value := min(target_value, max_value)
            w_ai[k] +=  delta_value / NAV
            w_ai[j] -=  delta_value / NAV
            used.add(k)

# Replace remaining bottom-decile names with the group index
for j in holdings:
    if decile[j] == 1 and w_ai[j] > 0:
        idx := index_for_group(g[j])
        w_ai[idx] += w_ai[j]
        w_ai[j]   = 0

# Normalize / clean
project w_ai onto the simplex (weights ≥ 0, sum = 1)

AI-Only (Full Replacement) – Pseudocode

inputs as above
initialize:
  w_ai := 0
  used := ∅
  backlog[group] := 0 for all groups

# Map each human slot to a top-decile name in the same group
for j in human holdings sorted by descending w_h[j]:
    group := g[j]
    C := { s in group | decile[s] == 10 and s ∉ used }
    if C ≠ ∅:
        k := argmax_s∈C ŷ[s]
        target_value := w_h[j] * NAV
        max_value := 0.20 * market_cap(k)
        delta_value := min(target_value, max_value)
        w_ai[k] += delta_value / NAV
        used.add(k)
        # any leftover because of cap stays to be assigned:
        if delta_value < target_value:
            backlog[group] += (target_value - delta_value) / NAV
    else:
        backlog[group] += w_h[j]

# Push any leftover weight to the group index
for each group:
    if backlog[group] > 0:
        idx := index_for_group(group)
        w_ai[idx] += backlog[group]

project w_ai onto the simplex

Performance Measurement – Dollar Alpha

Let Rᵐᵢ,𝑞 be fund i’s gross return from observed holdings in quarter q.
Let Rᵇᵢ,𝑞 be the corresponding DGTW benchmark return (same weights, matched groups).

Dollar alpha (human):

Vᵢ,𝑞 = Assetsᵢ,𝑞−1 (Rᵐᵢ,𝑞 − Rᵇᵢ,𝑞)

Incremental dollars (AI over human):

Zᵢ,𝑞 = Assetsᵢ,𝑞−1 (Rᴬᴵᵢ,𝑞 − Rᵐᵢ,𝑞)

Aggregate to lifetime averages per fund, then time-weighted across funds.

Sharpe Ratio (Definition & Use)

For fund i with quarterly excess returns ERᵢ,𝑡 = Rᵢ,𝑡 − Rʳᶠₜ:

Sharpeᵢ = ( mean(ERᵢ) / σ(ERᵢ,𝑡) ) × √16

Compare paired Sharpe differences:

ΔSharpeᵢ = Sharpeᵢᴬᴵ − Sharpeᵢᴴᵘᵐᵃⁿ

Findings:

  • Avg human Sharpe ≈ 0.47.
  • Avg paired difference (AI-modified − human) ≈ +0.17 (t ≈ 69).
  • ~95% of funds show a positive Sharpe improvement.
  • Similar uplifts for AI-only.

Risk Adjustments & Robustness

  • Factor alphas: FF5 + momentum, and with mispricing factors → AI outperformance persists or widens.
  • Sharpe and CDF dominance: AI nearly first-order stochastically dominates human quarterly performance.
  • Transaction costs:
    • Implied quarterly turnover: Human ~20%; AI-modified ~52% stocks + 3% indices; AI-only ~40% stocks + 10% indices.
    • Institutional cost estimates (≈13–37 bps per stock trade) → AI still dominates net of costs.
  • Benchmarks: Results hold when restricting to funds with verified size/value benchmarks and style-consistent holdings.
  • Incentives: Results hold for direct-sold (sophisticated) funds and for funds flagged as quantitative.

Why Human + Machine?

  • Public-signal extraction is a skill; AI scales nonlinearity & interactions across simple signals.
  • Constrained, style-consistent AI can deliver practical outperformance (not just paper alpha).

Hybrid process:

  • Humans: mandate, domain context, governance, risk/compliance, private info where valuable.
  • Machines: breadth, speed, nonlinearity, consistent execution, monitoring, “what-if”s.

Implementation Tips (Practitioner Slide)

  • Data ops: point-in-time joins; strict lags; feature health checks; backtest hygiene (no leakage).
  • Modeling: rolling expanding windows; time-series CV; monitor drift; ensemble RF/NN ranks.
  • PM integration: within-style candidate pools; depth caps; turnover budgets; quarterly (or monthly) cadence; explicit payout/compounding policy.
  • Risk: side-by-side factor loads; pre/post-trade limits; stress; transaction-cost controls.

Limitations & Scope

  • Results are partial-equilibrium (single fund adopting AI). Widespread adoption → greater price impact, lower marginal gains.
  • Lower-bound cost estimate: conservative sizing; quarterly payout prevents compounding of AI gains; RF today < RF+NN tomorrow.
  • Private information could still add value; study quantifies public-info processing costs.

Takeaways

  • Managers left economically large returns on the table by not further processing public data.
  • Human + machine collaboration substantially improves risk-adjusted performance under real constraints.
  • Challenges the idea that public info is “free” to use; processing costs are material.
  • As AI improves, the shadow price of public information likely rises.